Variant Discovery    ◾    133

4.2.2.2.4  Converting SAM files into BAM files

We must convert SAM files into BAM files to save storage space. Moreover, BAM files can

be manipulated faster. The following script creates the directory “bam” and converts the

SAM files into BAM files:

mkdir bam

cd sam

for i in $(ls *.sam | rev | cut -c 5- | rev);

do

samtools view -uS -o ../bam/${i}.bam ${i}.sam

done

cd ..

4.2.2.2.5  Sorting and indexing alignments in the BAM files

The alignments in BAM files are to be sorted by chromosomes in the reference genome to

be used in the downstream analysis. The following bash script uses samtools to sort and

index the BAM files and stores them in a new directory called “sortedbam”:

mkdir sortedbam

cd bam

for i in $(ls *.bam);

do

samtools sort -T ../sortedbam/tmp.sort -o ../sortedbam/${i} ${i}

samtools index ../sortedbam/${i}

done

cd ..

4.2.2.2.6  Extracting a chromosome or an interval

Most of the time, we may be interested in the identification of variants on the whole genome.

However, sometimes the study may focus on a specific chromosome or an interval of the

genome. In case the target is the variants of the whole genome, you can skip this step. You

should remember that identifying variants from whole genome requires large memory and

storage space. Therefore, for demonstration and the sake of simplicity, we will focus only

on chromosome 21 of human genome. In the following, we will create a directory “chr21”

and use samtools to extract the alignments of chromosome 21 and store the BAM files and

sort them:

mkdir chr21

cd sortedbam

for i in $(ls *.bam|rev|cut -c 5-|rev);

do

samtools view -b ${i}.bam chr21 > ../chr21/${i}.bam

samtools index ../chr21/${i}.bam

done

cd ..